The Deep Web: Surfacing Hidden Value
نویسنده
چکیده
Traditional search engines create their indices by spidering or crawling surface Web pages. To be discovered, the page must be static and linked to other pages. Traditional search engines can not "see" or retrieve content in the deep Web -those pages do not exist until they are created dynamically as the result of a specific search. Because traditional search engine crawlers can not probe beneath the surface, the deep Web has heretofore been hidden.
منابع مشابه
A Comparative Study of Hidden Web Crawlers
A large amount of data on the WWW remains inaccessible to crawlers of Web search engines because it can only be exposed on demand as users fill out and submit forms. The Hidden web refers to the collection of Web data which can be accessed by the crawler only through an interaction with the Web-based search form and not simply by traversing hyperlinks. Research on Hidden Web has emerged almost ...
متن کاملGoogle's Deep Web crawl
The Deep Web, i.e., content hidden behind HTML forms, has long been acknowledged as a significant gap in search engine coverage. Since it represents a large portion of the structured data on the Web, accessing Deep-Web content has been a long-standing challenge for the database community. This paper describes a system for surfacing Deep-Web content, i.e., pre-computing submissions for each HTML...
متن کاملDeep Webpage Classification and Extraction (DWCE)
As the Deep web (or Hidden web) information is hidden behind the search query forms, this information can only be accessed by interacting with these forms. Therefore, development of automated system that interacts with the search forms and extracts the hidden web pages would be of great value to human users. To accomplish this task stated above, this paper proposes a novel method “Deep Webpage ...
متن کاملA Bootstrapping Approach to classification of Deep web Query Interfaces
Classification of Deep web sources is a very important process for the data extraction process while accessing the deep web content since it deals with domain specific data only. The existing methods cannot effectively classify these web databases. Hence, to solve this problem, we propose a new framework that uses the bootstrapping approach for automatic and accurate classification of the query...
متن کاملA Task-specific Approach for Crawling the Deep Web
There is a great amount of valuable information on the web that cannot be accessed by conventional crawler engines. This portion of the web is usually known as the Deep Web or the Hidden Web. Most probably, the information of highest value contained in the deep web, is that behind web forms. In this paper, we describe a prototype hidden-web crawler able to access such content. Our approach is b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000